-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pick up Propolis 50cb28f5 #5906
Conversation
sled-agent/src/common/instance.rs
Outdated
// Decide which of Propolis's reported migrations to pay attention to. | ||
// If this Propolis is currently a migration target (evidenced by its ID | ||
// appearing in the `dst_propolis_id` field in the instance record), | ||
// look at the current migration in. Otherwise, look at the reported | ||
// migration out. | ||
let role = if instance_runtime | ||
.dst_propolis_id | ||
.is_some_and(|id| id == propolis_id) | ||
{ | ||
MigrationRole::Target | ||
} else { | ||
MigrationRole::Source | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is going to have to change substantially once sled-agent no longer has an InstanceRuntimeState
to make this determination. I think we ought to be able to do this instead using the migration_state
in InstanceStates
, which should also have the ID of the currently active migration (which would let us check whether the migration in or migration out refers to the current migration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, thanks! d9b4ae8 should clean this up a bit.
these clippy failures should be fixed by #5907, btw, so rebasing should fix that. |
This looks good in local testing (I've gone through a few instance state transitions and checked to make sure that instance reboot works properly now that the fix for oxidecomputer/propolis#709 is in place). Still need to pick up the fix for #5907 in this branch, but otherwise this is ready for review. |
looks great! thanks for changing it to use the |
I appear to have upset the migration integration tests; will investigate. |
The problem is an assertion failure caused by d9b4ae8 (newest commit in this PR). The simplest fix may just be to revert that commit, but I'll think about it some more. The basic issue is that d9b4ae8 breaks this assumption by making This is, of course, way more complicated than it needs to be: sled agent shouldn't be in the business of updating instance runtime state at all, and if it weren't there would be no assertions to trip. Unfortunately, we're not there quite yet, so we'll need to stitch something together to keep this working in the meantime. |
d9b4ae8
to
26cb441
Compare
The instance runtime state machine in sled agent requires that an inbound migration appear to be completed only once (so that there's only one "move the destination Propolis ID to the current Propolis ID and clear the destination ID" operation per successful migration). This operation (clearing the instance's active migration ID) is itself what prevents future Propolis state updates from being interpreted as a completed migration (once the migration ID is cleared from sled agent's runtime state, it will no longer match the old ID that Propolis is reporting). In the brave new world where sled agent doesn't track instance runtime state at all, substantially all of this code will be removed: there will be no instance runtime state to transition to anything; sled agent will just pass migration statuses straight through to Nexus.
I haven't rebased this PR on top of #5907 yet (so as not to lose my place in the buildomat queue), but I've done so locally, run |
26cb441
to
ecc9c33
Compare
As we discussed on Matrix, I'm fine with the change back to using |
Update to Propolis commit 50cb28f5 to obtain the following updates:
50cb28f5 changes the Propolis instance status and migration status APIs to report both inbound and outbound migration status. Update sled agent accordingly.
Tests:
cargo nextest
; will also run a couple of instances on a dev cluster before merging.